Unified frame and segment based models for automatic speech recognition

نویسندگان

  • Hsiao-Wuen Hon
  • Kuansan Wang
چکیده

In this paper, we propose an analytically tractable framework that integrates the frame and segment based acoustic modeling techniques. We combine the two approaches by jointly modeling their respective hidden Markov processes. Since the joint process is based on the same mathematical framework, conventional search and training techniques, such as Viterbi and EM algorithms, can be directly applied. It also allows the score from either model to contribute to the training and decoding of the other, reaching a jointly optimal decision. We conducted two series of experiments to verify our hypotheses. In the phone-pair classification experiments, our segment models show a 24% error reduction over state-of-the-art HMM-based system. The superior quality of segment models contributes to an 8.2% reduction in word error rates for the unified system on the WSJ dictation task.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Persian Phone Recognition Using Acoustic Landmarks and Neural Network-based variability compensation methods

Speech recognition is a subfield of artificial intelligence that develops technologies to convert speech utterance into transcription. So far, various methods such as hidden Markov models and artificial neural networks have been used to develop speech recognition systems. In most of these systems, the speech signal frames are processed uniformly, while the information is not evenly distributed ...

متن کامل

بهبود عملکرد سیستم بازشناسی گفتار پیوسته بوسیله ویژگی‌های استخراج شده از مانیفولدهای گفتاری در فضای بازسازی شده فاز

The design for new feature extraction methods out of the speech signal and combination of their obtained information is one of the most effective approaches to improve the performance of automatic speech recognition (ASR) system. Recent researches have been shown that the speech signal contains nonlinear and chaotic properties, but the effects of these properties are not used in the continuous ...

متن کامل

On designing and evaluating speech event detectors

We study issues related to designing speech event detectors for automatic speech recognition. Event detection is a critical component of a recently proposed automatic speech attribute transcription (ASAT) paradigm for speech research. Similar to keyword spotting and non-keyword rejection, a good detector needs to effectively detect speech attributes of interest while rejecting extraneous events...

متن کامل

A Database for Automatic Persian Speech Emotion Recognition: Collection, Processing and Evaluation

Abstract   Recent developments in robotics automation have motivated researchers to improve the efficiency of interactive systems by making a natural man-machine interaction. Since speech is the most popular method of communication, recognizing human emotions from speech signal becomes a challenging research topic known as Speech Emotion Recognition (SER). In this study, we propose a Persian em...

متن کامل

On the use of speech parameter contours for emotion recognition

Many features have been proposed for speech-based emotion recognition, and a majority of them are frame based or statistics estimated from frame-based features. Temporal information is typically modelled on a per utterance basis, with either functionals of frame-based features or a suitable back-end. This paper investigates an approach that combines both, with the use of temporal contours of pa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000